Memory-Efficient GroupBy-Aggregate using Compressed Buffer Trees
ثبت نشده
چکیده
Memory is rapidly becoming a precious resource in many data processing environments. This paper introduces a new data structure called a Compressed Buffer Tree (CBT). Using a combination of buffering, compression, and lazy aggregation, CBTs can improve the memory efficiency of the GroupBy-Aggregate abstraction which forms the basis of many data processing models like MapReduce and databases. We evaluate CBTs in the context of MapReduce aggregation, and show that CBTs can provide significant advantages over existing hashbased aggregation techniques: up to 2× less memory and 1.5× the throughput, at the cost of 2.5× CPU.
منابع مشابه
PowerQ: An Interactive Keyword Search Engine for Aggregate Queries on Relational Databases
Keyword search over relational databases has gained popularity due to its ease of use. Current research has focused on the efficient computation of results from multiple tuples, and largely ignores queries to retrieve statistical information from databases. The work in [5] developed a system that allows aggregate functions to be expressed using simple keywords. However, this system may return i...
متن کاملUltra High Speed Packet Buffering using “Parallel Packet Buffer”
Modern switches and routers often use dynamic RAM (DRAM) in order to provide large buffer storage space. However, the effective bandwidth of DRAM is frequently a limiting factor in the design of high-speed switches and routers. The focus of this paper is to introduce a packet-buffering architecture called the parallel packet buffering (PPB), which increases the effective memory bandwidth signif...
متن کاملElf: Efficient lightweight fast stream processing at scale
Stream processing has become a key means for gaining rapid insights from webserver-captured data. Challenges include how to scale to numerous, concurrently running streaming jobs, to coordinate across those jobs to share insights, to make online changes to job functions to adapt to new requirements or data characteristics, and for each job, to efficiently operate over different time windows. Th...
متن کاملFast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees
Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query ru...
متن کاملAnswering Keyword Queries involving Aggregates and GROUPBY on Relational Databases
Keyword search over relational databases has gained popularity as it provides a user-friendly way to explore structured data. Current research in keyword search has largely ignored queries to retrieve statistical information from the database. The work in [13] extends keywords by supporting aggregate functions in their SQAK system. However, SQAK does not consider the semantics of objects and re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012